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METHOD AND DEVICE FOR NOISE REDUCTION 

Field of the invention 
10 [0001] The present invention is related to a method 

and device for adaptively reducing the noise in speech 
communication applications . 

State of the art 

15 [0002] In speech communication applications, such as 

teleconferencing, hands- free telephony and hearing aids, 
the presence of background noise may significantly reduce 
the intelligibility of the desired speech signal. Hence, 
the use of a noise reduction algorithm is necessary. Multi- 

20 microphone systems exploit spatial information in addition 
to temporal and spectral information of the desired signal 
and noise signal and are thus preferred to single 
microphone procedures. Because of aesthetic reasons, multi- 
microphone techniques for e.g., hearing aid applications go 

25 together with the use of small-sized arrays. Considerable 
noise reduction can be achieved with such arrays, but at 
the expense of an increased sensitivity to errors in the 
assumed signal model such as microphone mismatch, 
reverberation, ... (see e.g. Stadler & Rabinowitz^ ^On the 

30 potential of fixed arrays for hearing aids^ ^ J- Acoust. 
Soc. Amer., vol. 94, no. 3, pp. 1332-1342, Sep. 1993) In 
hearing aids, microphones are rarely matched in gain and 
phase. Gain and phase differences between microphone 
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characteristics can amount up to 6 dB and 10°, 
respectively . 

[0003] A widely studied multi-channel adaptive noise 

reduction algorithm is the Generalised Sidelobe Canceller 
5 (GSC) (see e.g. Griffiths & Jim^ ^An alternative approach 
to linearly constrained adaptive beamforming\ IEEE Trans. 
Antennas Propag. , vol. 30, no. 1, pp. 27-34, Jan. 1982 and 
US-5473701 ^Adaptive microphone arrays ) . The GSC consists 
of a fixed, spatial pre-processor , which includes a fixed 

10 beamformer and a blocking matrix, and an adaptive stage 
based on an Adaptive Noise Canceller (ANC) . The ANC 
minimises the output noise power while the blocking matrix 
should avoid speech leakage into the noise references. The 
standard GSC assumes the desired speaker location, the 

15 microphone characteristics and positions to be known, and 
reflections of the speech signal to be absent. If these 
assumptions are fulfilled, it provides an undistorted 
enhanced speech signal with minimum residual noise. 
However, in reality these assumptions are often violated,. 

2 0 resulting in so-called speech leakage and hence speech 
distortion. To limit speech distortion, the ANC is 
typically adapted during periods of noise only. When used 
in combination with small-sized arrays, e.g., in hearing 
aid applications, an additional robustness constraint (see 

25 Cox et al.f ^Robust adaptive beamforming^ IEEE Trans. 
Acoust. Speech and Signal Processing^ ^ vol. 35, no. 10^ pp. 
1365-1376 , Oct. 1987) is required to guarantee performance 
in the presence of small errors in the assumed signal 
model, such as microphone mismatch. A widely applied method 

30 consists of imposing a Quadratic Inequality Constraint to 
the ANC (QIC-GSC) . For Least Mean Squares (LMS) updating, 
the Scaled Projection Algorithm (SPA) is a simple and 
effective technique that imposes this constraint. However, 
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using the QIC-GSC goes at the expense of less noise 
reduction . 

[0004] A Multi- channel Wiener Filtering (MWF) 

technique has been proposed (see Dodo & Moonen^ ^GSVD- 
5 based optimal filtering for single and multimicrophone 
speech enhancement^ ^ IEEE Trans. Signal Processing^ vol. 
50, no. 9, pp. 2230-2244, Sep. 2002) that provides a 
Minimum Mean Square Error (MMSE) estimate of the desired 
signal portion in one of the received microphone signals. 

10 In contrast to the ANC of the GSC, the MWF is able to take 
speech distortion into account in its optimisation 
criterion, resulting in the Speech Distortion Weighted 
Multi-channel Wiener Filter (SDW-MWF) . The (SDW-)MWF 
technique is uniquely based on estimates of the second 

15 order statistics of the recorded speech signal and the 
noise signal. A robust speech detection is thus again 
needed- In contrast to the GSC, the (SDW-)MWF does not make 
any a priori assumptions about the signal model such that 
no or a less severe robustness constraint is needed to 

20 guarantee performance when used in combination with small- 
sized arrays. Especially in complicated noise scenarios 
such as multiple noise sources or diffuse noise, the (SDW- 
)MWF outperforms the GSC, even when the GSC is supplemented 
with a robustness constraint . 

25 [0005] A possible implementation of the (SDW-)MWF is 

based on a Generalised Singular Value Decomposition (GSVD) 
of an input data matrix and a noise data matrix. A cheaper 
alternative based on a QR Decomposition (QRD) has been 
proposed in Rombouts & Moonen , ^QRD-hased unconstrained 

30 optimal filtering for acoustic noise reduction^ ^ Signal 
Processing, vol. 83, no. 9, pp. 1889-1904, Sep. 2003. 
Additionally, a subband implementation results in improved 
intelligibility at a significantly lower cost compared to 
the fullband approach. However, in contrast to the GSC and 
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the QIC-GSC, no cheap stochastic gradient based 
implementation of the (SDW-)MWF is available yet. In 
Nordholm et al.^ ^Adaptive microphone array employing 
calibration signals: an analytical evaluation^ ^ IEEE Trans. 
5 Speech, Audio Processing, vol. 7, no. 3, pp. 241-252, May 
1999, an LMS based algorithm for the MWF has been 
developed. However, said algorithm needs recordings of 
calibration signals. Since room acoustics, microphone 
characteristics and the location of the desired speaker 

10 change over time, frequent re-calibration is required, 
making this approach cumbersome and expensive. Also an LMS 
based SDW-MWF has been proposed that avoids the need for 
calibration signals (see Florencio & Malvar^ ^Multichannel 
filtering for optimum noise reduction in microphone 

15 arrays^ Int. Conf. on Acoust., Speech^ and Signal Proc, 
Salt Lake City, USA, pp. 197-200, May 2001). This algorithm 
however relies on some independence assumptions, that are 
not necessarily satisfied, resulting in degraded 
performance . 

20 [0006] The GSC and MWF techniques are now presented 

more in detail. 

Generalised Sidelobe Canceller (GSC) 

25 [0007] Fig. 1 describes the concept of the 

Generalised Sidelobe Canceller (GSC) , which consists of a 
fixed, spatial pre-processor , i.e. a fixed beamformer A(z) 
and a blocking matrix B(z), and an ANC. Given M microphone 
signals 

30 t^i[k] = uj[k]-hu"[kl /■ = 1,...,M (equation 1) 

with u-[k] the desired speech contribution and u"[k] the 
noise contribution, the fixed beamformer A(z) (e.g. delay- 
and-sum) creates a so-called speech reference 
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yoW = yoW-^yo[kl (equation 2) 

by steering a beam towards the direction of the desired 
signal, and comprising a speech contribution yo[k] and a 

noise contribution >^o[^] • '^h^ blocking matrix B(z) creates 
5 M-1 so-called noise references 

y,[k] = y;[k] + y;'[kl / = 1,...,M-1 (equation 3) 
by steering zeroes towards the direction of the desired 
signal source such that the noise contributions y"[k] are 

dominant compared to the speech leakage contributions yjlk] . 

10 In the sequel, the superscripts s and n are used to refer 
to the speech and the noise contribution of a signal. 
During periods of speech + noise, the references yX^] , 
i=0.,.M-l contain speech + noise. During periods of noise 
only, the references only consist of a noise component, 

15 i.e. yi\k^ = y['\k^ ' The second order statistics of the noise 
signal are assumed to be quite stationary such that they 
can be estimated during periods of noise only. 
[0008] To design the fixed, spatial pre-processor , 

assumptions are made about the microphone characteristics, 

20 the speaker position and the microphone positions and 
furthermore reverberation is assumed to be absent. If these 
assumptions are satisfied, the noise references do not 
contain any speech, i.e., >'/[^] = 0, for i=I,..., M-1. However, 

in practice, these assumptions are often violated (e.g. due 
25 to microphone mismatch and reverberation) such that speech 
leaks into the noise references. To limit the effect of 
such speech leakage, the ANC filter w,^^,/., e C^^''"^^'''* 

w,Vi=[wr ... w;;',^,] (equation 4) 

where 

30 w. =[w.[0] ... (equation 5) 
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with L the filter length, is adapted during periods of 
noise only. (Note that in a time-domain implementation the 
input signals of the adaptive filter Wj-m-i and the filter 
^i:M-i are real. In the sequel the formulas are generalised 
5 to complex input signals such that they can also be applied 
to a subband implementation.) Hence, the ANC filter Wi:m~i 
minimises the output noise power, i.e. 

^xM^x =argmin£{|j;;'[A:-A]-w,Vi[^]y;!AY-i[rf} (equation 6) 
leading to 

10 w,^_,=£{y;'^_,[^]y;'j^_,[A:]}-'£{y;'^_J^]:^r[^-A^ (equation 7) 

where 

y'::",.m = [y"-"m - yTm] (equation 8) 

y'm = [ym y"\.k-\] ... y^ik-L-^nJ (equation 9) 

and where A is a delay applied to the speech reference to 
15 allow for non-causal taps in the filter Wi..m-i- The delay A 
is usually set to [^yl ' where |"x] denotes the smallest 
integer equal to or larger than x. The subscript 1:M-i in 
Wi.-w-i and yi:M'i refers to the subscripts of the first and 
the last channel component of the adaptive filter and input 
20 vector, respectively. 

[0009] Under ideal conditions ( = 0, / = -1 ) , 

the GSC minimises the residual noise while not distorting 
the desired speech signal, i.e. z\k\ = yl[k — la^^ , However, when 

used in combination with small-sized arrays, a small error 
25 in the assumed signal model (resulting in 

/ = !,.. .,M-1 ) already suffices to produce a 

significantly distorted output speech signal [k] 

^'[^] = ;^o[^~A]-w^,_,y;;^^_,[^], (equation 10) 

even when only adapting during noise-only periods, such 
30 that a robustness constraint on w^.m-i is required. In 



addition, the fixed beamformer A(z) should be designed such 

that the distortion in the speech reference yl[k'\ is minimal 

for all possible model errors. In the sequel, a delay-and- 
sum beamformer is used. For small-sized arrays, this 
5 beamformer offers sufficient robustness against signal 
model errors, as it minimises the noise sensitivity. The 
noise sensitivity is defined as the ratio of the spatially 
white noise gain to the gain of the desired signal and is 
often used to quantify the sensitivity of an algorithm 
10 against errors in the assumed signal model. When 
statistical knowledge is given about the signal model 
errors that occur in practice, the fixed beamformer and the 
blocking matrix can be further optimised. 

[0010] A common approach to increase the robustness 

15 of the GSC is to apply a Quadratic Inequality Constraint 
(QIC) to the ANC filter Wi:m.i, such that the optimisation 
criterion (eq.S) of the GSC is modified into 

(equation 11) 

subject to w^^_, w,^^^, < p ' . 

The QIC avoids excessive growth of the ; filter coefficients 
20 Wx:M-i' Hence, it reduces the undesired speech distortion 
when speech leaks into the noise references. 

The QIC-GSC can be implemented using the adaptive scaled 
projection algorithm (SPA)_: at each update step, the 
quadratic constraint is applied to the newly obtained ANC 
25 filter by scaling the filter coefficients by , ^ , when 

^vM~\^\M-\ exceeds /3^. Recently, Tian et al . implemented the 
quadratic constraint by using variable loading ( ^Recursive 
least squares implementation for LCMP Beamforming under 
quadratic constraint^ ^ IEEE Trans. Signal Processing^ vol. 
30 49, no. G, pp. 1138-1145 , June 2001) . For Recursive Least 
Squares (RLS) , this technique provides a better 
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approximation to the optimal solution (eq.ll) than the 
scaled projection algorithm. 

Multi-Channel Wiener Filtering (MWF) 

5 

[0011] The Multi-channel Wiener filtering (MWF) 

technique provides a Minimum Mean Square Error (MMSE) 
estimate of the desired signal portion in one of the 
received microphone signals. In contrast to the GSC, this 

10 filtering technique does not make any a priori assumptions 
about the signal model and is found to be more robust . 
Especially in complex noise scenarios such as multiple 
noise sources or diffuse noise, the MWF outperforms the 
GSC, even when the GSC is supplied with a robustness 

15 constraint . 

[0012] The MWF wi:a^ e C^''""* minimises the Mean Square 

Error (MSE) between a delayed version of the (unknown) 
speech signal w/[/: — A] at the i-th (e.g. first) microphone 

and the sum WhA/Ui:>v/[^] ^ filtered microphone signals, 

20 i.e. 

w iM = arg min E \\u'J[k - A] - yvvM^u^i [^f] > ( equat ion 1 2 ) 

leading to 

w,:A. = ^{u,^[A:]u,^,[A:]}"'iE{u,^,[^K''[A:- A]}, (equation 13 ) 

with 

25 yi^^^ = [vv^;^ y;^^ ... ^^/], (equation 14) 

<,W = [<[k] n^[k] ••• KW]^ (equation 15) 

u.[A] = [w.[A:] u^k-l] ••• + (equation 16) 

where u± [k] comprise a speech component and a noise 
component . 
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[0013] An equivalent approach consists in estimating 

a delayed version of the (unknown) noise signal u"[k-A] in 
the i-th microphone, resulting in 

w,^^ =argmin£{|w;[A:-A]-w (equation 17) 

5 and 

yv,j^=E{u,j^[k]u^j,,[k]^^^ (equation 18) 

where 

w^Ay=[wr yv" ^Ai\ (equation 19) 

The estimate z [k] of the speech component uJ[k-~A] is then 
10 obtained by subtracting the estimate wf^^u,^[/:] of u"[k-A] 
from the delayed, i-th microphone signal u-lk — A], i.e. 

z[^] = w.[A:-A]~w^^u,^^[A:]. (equation 20) 

This is depicted in Fig. 2 for u^'[k-A] = u"[k-A] . 
[0014] The residual error energy of the MWF equals 

15 E{\e[k]\'} = E{\u;[k~A]''yvl!MU,^^^^^ (equation 21) 

and can be decomposed into 

E{\u;[k - A] - vvUmKm Wf} + E{\yvi;j^K.M Wf) ( equat ion 2 2 ) 

V ^ ^ V ^ , 

where equals the speech distortion energy and 8,^ the 

residual noise energy. The design criterion of the MWF can 
20 be generalised to allow for a trade-off between speech 
distortion and noise reduction, by incorporating a 
weighting factor p with |ae[0,cx>] 

wi:A/ = arg min E{\u;[k - A] - wi^/u;;^,[/:]|^} + ^£{|whA^u;:^,,[A:]|Y ( equat ion 2 3 ) 

The solution of (eq.23) is given by 
25 w,:A. = £{u;;,J^]u;^';[^]+ (equation 24) 

[0015] Equivalently, the optimisation criterion for 

y^i'.M'i in (eq.l7) can be modified into 
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w,^ =!agnnnE{\yv{!j^ulj,,[kf} + ^^^^ (equation 25) 

resulting in 

^vM =E{KM[k]u';.f,[k] + -ulj,,[^^^^^^ (equation 26) 

In the sequel, (eq.26) will be referred to as the Speech 
5 Distortion Weighted Multi-channel Wiener Filter (SDW-MWF) . 
The factor ^€[0,oo] trades off speech distortion versus 
noise reduction. If /J=l, the MMSE criterion (eq.l2) or 

(eq.l7) is obtained. If ]J>1, the residual noise level will 
be reduced at the expense of increased speech distortion. 
10 By setting p to all emphasis is put on noise reduction 
and speech distortion is completely ignored. Setting jj to 0 
on the other hand, results in no noise reduction. 

[0016] In practice, the correlation matrix 

^{"i-A^MWiwt^]} ^® unknown. During periods of speech, the 
15 inputs Uf[k] consist of speech + noise, i.e., 
Uf[k]-uf[k]-\-u"[k]J = l,...^M , During periods of noise, only the 
noise component u"[k] is observed. Assuming that the speech 

signal and the noise signal are uncorrelated, E{^Im[^^^\-m\.^]} 
can be estimated as 
20 E{ul^[k]u;:,^^^^^^ (equation 27) 

where the second order statistics -fi^i^^iiA/t^J^'u/t^]} are 
estimated during speech + noise and the second order 
statistics E{u"j^f[k]u":^[k]} during periods of noise only. As 

for the GSC, a robust speech detection is thus needed. 
25 Using (eq.27), (eq.24) and (eq.26) can be re-written as: 

x(£{u,,J^K[^-A]}-£{u;;,[^Kl^-A]}) 

(equation 28) 
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f ] 1 V' 

(equation 29) 

The Wiener filter may be computed at each time instant k 
by means of a Generalised Singular Value Decomposition 
5 (GSVD) of a speech + noise and noise data matrix. A cheaper 
recursive alternative based on a QR-decomposition is also 
available. Additionally, a subband implementation increases 
the resulting speech intelligibility and reduces 
complexity, making it suitable for hearing aid 
10 applications. 

Aims of the invention 

[0017] The present invention aims to provide a 

method and device for adaptively reducing the noise, 
15 especially the background noise, in speech enhancement 
applications, thereby overcoming the problems and drawbacks 
of the state-of-the-art solutions. 

Summary of the invention 
20 [0018] The present invention relates to a method to 

reduce noise in a noisy speech signal, comprising the steps 
of 

• applying at least two versions of the noisy speech 
signal to a first filter, whereby that first filter 

2 5 outputs a speech reference signal and at least one noise 

reference signal, 

• applying a filtering operation to each of the at least 
one noise reference signals, and 

• subtracting from the speech reference signal each of the 
30 filtered noise reference signals, 

characterised in that the filtering operation is performed 
with filters having filter coefficients determined by 
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taking into account speech leakage contributions in the at 
least one noise reference signal. 

[00191 In a typical embodiment the at least two 

versions of the noisy speech signal are signals from at 
5 least two microphones picking up the noisy speech signal. 

[0020] Preferably the first filter is a spatial pre- 

processor filter, comprising a beamformer filter and a 
blocking matrix filter. 

[0021] In an advantageous embodiment the speech 

10 reference signal is output by the beamformer filter and the 
at least one noise reference signal is output by the 

blocking matrix filter. 

[0022] In a preferred embodiment the speech 

reference signal is delayed before performing the 

15 subtraction step. 

[0023] Advantageously a filtering operation is 

additionally applied to the speech reference signal, where 
the filtered speech reference signal is also subtracted 
from the speech reference signal . 

20 [0024] In another preferred embodiment the method 

further comprises the step of regularly adapting the filter 
coefficients. Thereby the speech leakage contributions in 
the at least one noise reference signal are taken into 
account or, alternatively, both the speech leakage 

25 contributions in the at least one noise reference signal 
and the speech contribution in the speech reference signal . 
[0025] The invention also relates to the use of a 

method to reduce noise as described previously in a speech 
enhancement application . 

30 [0026] In a second object the invention also relates 

to a signal processing circuit for reducing noise in a 
noisy speech signal, comprising 
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• a first filter having at least two inputs and arranged 
for outputting a speech reference signal and at least 
one noise reference signal, 

• a filter to apply the speech reference signal to and 
5 filters to apply each of the at least one noise 

reference signals to, and 

• summation means for subtracting from the speech 
reference signal the filtered speech reference signal 
and each of the filtered noise reference signals. 

10 [0027] Advantageously, the first filter is a spatial 

pre-processor filter, comprising a beamformer filter and a 
blocking matrix filter. 

[0028] In an alternative embodiment the beamformer 

filter is a delay-and-sum beamformer. 
15 [0029] The invention also relates to a hearing 

device comprising a signal processing circuit as described. 
By hearing device is meant an acoustical hearing aid 

(either external or implantable) or a cochlear implant. 

20 Sliort description of tTcie drawings 

[0030] Fig. 1 represents the concept of the 

Generalised Sidelobe Canceller. 

[0031] Fig. 2 represents an equivalent approach of 

multi-channel Wiener filtering. 
25 [0032] Fig. 3 represents a Spatially Pre-processed 

SDW-MWF. 

[0033] Fig. 4 represents the decomposition of SP- 

SDW-MWF with Wq in a multi-channel filter and single- 

channel post filter ei-Wo- 
30 [0034] Fig. 5 represents the set-up for the 

experiments . 
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[0035] Fig. 6 represents the influence of on the 

performance of the SDR GSC for different gain mismatches 

at the second microphone . 
[0036] Fig. 7 represents the influence of l//j on the 

5 performance of the SP-SDW-MWF with Wq for different gain 
mismatches at the second microphone. 

[0037] Fig. 8 represents the ASNRinteiiig and SDinteiiig 

for QIC-GSC as a function of (5^ for different gain 
mismatches Yj at the second microphone . 

10 [0038] Fig. 9 represents the complexity of TD and FD 

Stochastic Gradient (SG) algorithm with LP filter as a 
function of filter length L per channel; M=3 (for 
comparison, the complexity of the standard NLMS ANC and SPA 
are depicted too) . 

15 [0039] Fig. 10 represents the performance of 

different FD Stochastic Gradient (FD-SG) algorithms; (a) 
Stationary speech-like noise at 90°; (b) Multi-talker 
babble noise at 90°. 

[0040] Fig. 11 represents the influence of the LP 

20 filter on performance of FD stochastic gradient SP-SDW-MWF 

(l//j=0,5) without Wo and with Wq. Babble noise at 90°. 

[0041] Fig. 12 represents the convergence behaviour 

of FD-SG for X=0 and X=0.9998. The noise source position 

suddenly changes from 90° to 180° and vice versa. 
25 [0042] Fig. 13 represents the performance of FD 

stochastic gradient implementation of SP-SDW-MWF with LP 

filter {X=0 . 9998) in a multiple noise source scenario. 

[0043] Fig. 14 represents the performance of FD SPA 

in a multiple noise source scenario. 
30 [0044] Fig. 15 represents the SNR improvement of the 

frequency-domain SP-SDW-MWF (Algorithm 2 and Algorithm 4) 

in a multiple noise source scenario. 
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[0045] Fig. 16 represents the speech distortion of 

the frequency-domain SP-SDW-MWF (Algorithm 2 and Algorithm 
4) in a multiple noise source scenario. 

5 Detailed description of the invention 

[0046] The present invention is now described in 

detail. First, the proposed adaptive multi- channel noise 
reduction technique, referred to as Spatially Pre-processed 
Speech Distortion Weighted Multi-channel Wiener filter, is 
10 described. 

[0047] A first aspect of the invention is referred 

to as Speech Distortion Regul 3.2rised GSC (SDR-GSC) , A new 
design criterion is developed for the adaptive stage of the 
GSC: the ANC design criterion is supplemented with a 

15 regularisation term that limits speech distortion due to 
signal model errors. In the SDR-GSC, a parameter p is 
incorporated that allows for a trade-off between speech 
distortion and noise reduction. Focussing all attention 
towards noise reduction, results in the standard GSC, 

20 while, on the other hand, focussing all attention towards 
speech distortion results in the output of the fixed 
beamformer. In noise scenarios with low SNR, adaptivity in 
the SDR-GSC can be easily reduced or excluded by increasing 
attention towards speech distortion, i.e., by decreasing 

25 the parameter ji? to 0 . The SDR-GSC is an alternative to the 
QIC-GSC to decrease the sensitivity of the GSC to signal 
model errors such as microphone mismatch, reverberation, . . . 
In contrast to the QIC-GSC, the SDR-GSC shifts emphasis 
towards speech distortion when the amount of speech leakage 

30 grows. In the absence of signal model errors, the 
performance of the GSC is preserved. As a result, a better 
noise reduction performance is obtained for small model 
errors, while guaranteeing robustness against large model 
errors , 
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[0048] In a next step, the noise reduction 

performance of the SDR-GSC is further improved by adding an 
extra adaptive filtering operation wq on the speech 
reference signal. This generalised scheme is referred to as 
5 Spatially Pre-processed Speech Distortion Weighted Multi- 
channel Wiener Filter (SP-SDW-MWF) . The SP-SDW-MWF is 
depicted in Fig. 3 and encompasses the MWF as a special 
case. Again, a parameter ^ is incorporated in the design 
criterion to allow for a trade-off between speech 

10 distortion and noise reduction. Focussing all attention 
towards speech distortion, results in the output of the 
fixed beamformer. Also here, adaptivity can be easily 
reduced or excluded by decreasing p to 0 . It is shown that 
-in the absence of speech leakage and for infinitely long 

15 filter lengths- the SP-SDW-MWF corresponds to a cascade of 
a SDR-GSC with a Speech Distortion Weighted Single -channel 
Wiener filter (SDW-SWF) . In the presence of speech leakage, 
the SP-SDW-MWF with wq tries to preserve its performance: 
the SP-SDW-MWF then contains extra filtering operations 

2 0 that compensate for the performance degradation due to 
speech leakage. Hence, in contrast to the SDR-GSC (and thus 
also the GSC) , performance does not degrade due to 
microphone mismatch. Recursive implementations of the (SDW- 
) MWF exist that are based on a GSVD or QR decomposition. 

25 Additionally, a subband implementation results in improved 
intelligibility at a significantly lower complexity 
compared to the fullband approach. These techniques can be 
extended to implement the SDR-GSC and, more generally, the 
SP-SDW-MWF. 

30 [0049] In this invention, cheap time-domain and 

frequency -domain stochastic gradient implementations of the 
SDR-GSC and the SP-SDW-MWF are proposed as well. Starting 
from the design criterion of the SDR-GSC, or more 
generally, the SP-SDW-MWF, a time-domain stochastic 
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gradient algorithm is derived. To increase the convergence 
speed and reduce the computational complexity, the 
algorithm is implemented in the frequency -domain . To reduce 
the large excess error from which the stochastic gradient 
5 algorithm suffers when used in highly non- stationary noise, 
a low pass filter is applied to the part of the gradient 
estimate that limits speech distortion. The low pass filter 
avoids a highly time-varying distortion of the desired 
speech component while not degrading the tracking 

10 performance needed in time-varying noise scenarios. 
Experimental results show that the low pass filter 
significantly improves the performance of the stochastic 
gradient algorithm and does not compromise the tracking of 
changes in the noise scenario. In addition, experiments 

15 demonstrate that the proposed stochastic gradient algorithm 
preserves the benefit of the SP-SDW-MWF over the QIC-GSC, 
while its computational complexity is comparable to the 
NLMS based scaled projection algorithm for implementing the 
QIC. The stochastic gradient algorithm with low pass filter 

20 however requires data buffers, which results in a large 
memory cost . The memory cost can be decreased by 
approximating the regularisation term in the frequency- 
domain using (diagonal) correlation matrices, making an 
implementation of the SP-SDW-MWF in commercial hearing aids 

25 feasible both in terms of complexity as well as memory 
cost. Experimental results show that the stochastic 
gradient algorithm using correlation matrices has the same 
performance as the stochastic gradient algorithm with low 
pass filter. 

30 

Spatially pre-processed SDW Multi- channel Wiener Filter 
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Concept 

[0050] Fig. 3 depicts the Spatially pre-processed. 

Speech Distortion Weighted Multi-channel Wiener filter (SP- 
SDW-MWF) , The SP-SDW-MWF consists of a fixed, spatial pre- 
5 processor, i.e. a fixed beamformer A(z) and a blocking 
matrix B(z), and an adaptive Speech Distortion Weighted 
Multi-channel Wiener filter (SDW-MWF) . Given M microphone 
signals 

u.[k] = uJ[k] + u"[k]J = \,...,M (equation 30) 

10 with uj[k] the desired speech contribution and u"[k] the 
noise contribution, the fixed beamformer A(z) creates a so- 
called speech reference 

yoW^ylW-^yoW, (equation 31) 

by steering a beam towards the direction of the desired 

15 signal, and comprising a speech contribution y^ik] and a 

noise contribution yolk] . To preserve the robustness 
advantage of the MWF, the fixed beamformer A(z) should be 
designed such that the distortion in the speech reference 
j^q[A:] is minimal for all possible errors in the assumed 

20 signal model such as microphone mismatch. In the sequel, a 
delay-and-sum beamformer is used. For small-sized arrays, 
this beamformer offers sufficient robustness against signal 
model errors as it minimises the noise sensitivity. Given 
statistical knowledge about the signal model errors that 

25 occur in practice, a further optimised f ilter-and-sum 
beamformer A(z) can be designed. The blocking matrix B(z) 
creates M-1 so-called noise references 

yXf'] = y-[k]-^y:'[kl /=1,...,M-1 (equation 32) 
by steering zeroes towards the direction of interest such 

3 0 that the noise contributions y'/[k] are dominant compared to 
the speech leakage contributions yj[k] . A simple technique 
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to create the noise references consists of pairwise 
subtracting the time-aligned microphone signals. Further 
optimised noise references can be created, e.g. by 
minimising speech leakage for a specified angular region 
5 around the direction of interest instead of for the 
direction of interest only (e.g. for an angular region from 
-20° to 20° around the direction of interest) . In addition, 
given statistical knowledge about the signal model errors 
that occur in practice, speech leakage can be minimised for 

10 all possible signal model errors. 

[0051] In the sequel, the superscripts s and n are 

used to refer to the speech and the noise contribution of a 
signal. During periods of speech + noise, the references 
' / = 0,...,M-1 contain speech + noise. During periods of 

15 noise only, y^lk], i=0^,„,M-l only consist of a noise 

component, i.e. x[^] = >'r[^] • The second order statistics of 
the noise signal are assumed to be quite stationary such 
that they can be estimated during periods of noise only. 
[0052] The SDW-MWF filter Wo:m-i 

(\ Y 

U ; 

(equation 33) 

with 

<M-Ak] = [^"[k] ^v'^[k■] ... ^"^.m\ (equation 34) 
w,.[A:] = [vV;[0] vv.[l] ... vf,.[Z-l]]' (equation 35) 
25 y".M-m = [y"Vk^ y"{k} ... Ym-.W], (equation 36) 

y,W = b,m yXk-\] ... yXk-L + \]X, (equation 37) 
provides an estimate w^^^,,_,yQ.^,_, [A:] of the noise contribution 
y'^[k-l^] in the speech reference by minimising the cost 
function J {wo,m-i) 
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25 



(equation 38) 

The subscript 0:M-i in wo:m-i and yo.-w-i refers to the 
subscripts of the first and the last channel component of 
5 the adaptive filter and the input vector, respectively. The 
term represents the speech distortion energy and e,^ the 

residual noise energy. The term js^ in the cost function 

(eq.38) limits the possible amount of speech distortion at 
the output of the SP-SDW-MWF. Hence, the SP-SDW-MWF adds 
10 robustness against signal model errors to the GSC by taking 
speech distortion explicitly into account in the design 
criterion of the adaptive stage. The parameter -^e[0,oo) 

trades off noise reduction and speech distortion: the 
larger l/jJ, the smaller the amount of possible speech 

15 distortion. For ]J=0 , the output of the fixed beamformer 
A(z) , delayed by A samples is obtained. Adaptivity can be 
easily reduced or excluded in the SP-SDW-MWF by decreasing 
ju to 0 (e.g., in noise scenarios with very low signal-to- 
noise Ratio (SNR) , e.g., -10 dB, a fixed beamformer may be 

20 preferred.) Additionally, adaptivity can be limited by 
applying a QIC to Wo..m-i- 

[0053] Note that when the fixed beamformer A(z) and 

the blocking matrix B(z) are set to 



A(z) = [l 0 ... of 



B(z) = 



0 1 

0 

0 



0 

0 
0 



... 0 

1 0 
0 1 



(equation 39) 



(equation 40) 



one obtains the original SDW-MWF that operates on the 
received microphone signals w,.[^], / = 1,...,A/ . 



10 
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[0054] Below, the different parameter settings of 

the SP-SDW-^WF are discussed. Depending on the setting of 
the parameter p and the presence or the absence of the 
filter Wo, the GSC, the (SDW-)MWF as well as in-between 
solutions such as the Speech Distortion Regularised GSC 
(SDR-GSC) are obtained. One distinguishes between two 
cases, i.e. the case where no filter Wo is applied to the 
speech reference (filter length Lo=0) and the case where an 
additional filter Wo is used (Lo^O) . 

SDR-GSC, i.e., SP-SDW-MWF without W p 

[0055] First, consider the case without Wq, i.e. 

Lo=0. The solution for w,.^^_, in (eq.33) then reduces to 

arg min - £{|w,'^,_,y [kf} + E{U [k-A]^ ^u-^Im-^ [kf} , ( equat ion 41) 



15 leading to 



r 1 

y^vM-^ = -^{y;;A.-.[*]y;^-.[^]}+^{y"A.-my;'j^-.m} 



(equation 42) 



where is the speech distortion energy and e;^ the 

residual noise energy. 
20 [0056] Compared to the optimisation criterion (eq.6) 

of the GSC, a regularisation term 

-^£:{|w^^_iy;^,_,[/:]|'} (equation 43) 

has been added. This regularisation term limits the amount 
of speech distortion that is caused by the filter Wi.-m-i 
25 when speech leaks into the noise references, i.e. 
yj[k]^0, i^l,...,M ^ In the sequel, the SP-SDW-MWF with Lo=0 
is therefore referred to as the Speech Distortion 
Regularized GSC (SDR-GSC) . The smaller ^, the smaller the 
resulting amount of speech distortion will be. For ju=0, all 
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emphasis is put on speech distortion such that z [k] is 
equal to the output of the fixed beamformer A(z) delayed by 
A samples. For ;l7=«> all emphasis is put on noise reduction 
and speech distortion is not taken into account. This 
5 corresponds to the standard GSC. Hence, the SDR-GSC 
encompasses the GSC as a special case . 

[0057] The regularisation term (eq.43) with l/ju^^O 

adds robustness to the GSC, while not affecting the noise 
reduction performance in the absence of speech leakage: 
10 • In the absence of speech leakage, i.e., 

yj[k] = 0, i = \,...,M — I , the regularisation term equals 0 

for all Wji.M-i and hence the residual noise energy s,^ 

is effectively minimised. In other words, in the 
absence of speech leakage, the GSC solution is 
15 obtained. 

• In the presence of speech leakage, i.e., 
>'/[A:] 9t 0, /■ = — 1 , speech distortion is explicitly 

taken into account in the optimisation criterion 
(eq.41) for the adaptive filter w^^M-it limiting speech 
20 distortion while reducing noise. The larger the amount 

of speech leakage, the more attention is paid to 
speech distortion . 
To limit speech distortion alternatively, a QIC is often 
imposed on the filter Wi-^.i. In contrast to the SDR-GSC, 
25 the QIC acts irrespective of the amount of speech leakage 
y'[A:] that is present. The constraint value in (eq.ll) 

has to be chosen based on the largest model errors that may 
occur. As a consequence, noise reduction performance is 
compromised even when no or very small model errors are 
30 present. Hence, the QIC is more conservative than the SDR- 
GSC, as will be shown in the experimental results. 
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SP-SDW-MWF with filter W p 

[0058] Since the SDW-MWF (eq.33) takes speech 

distortion explicitly into account in its optimisation 
criterion, an additional filter Wq on the speech reference 
y^ik] may be added. The SDW-MWF (eq.33) then solves the 
following more general optimisation criterion 



Wo:A/-i = arg min 



(equation 44) 

where w^^^.j =[w^ w^^,^_,] is given by (eq.33). 

10 [0059] Again, p trades off speech distortion and 

noise reduction. For jLi=~ speech distortion ej is completely 

ignored, which results in a zero output signal. For /j=0 all 
emphasis is put on speech distortion such that the output 
signal is equal to the output of the fixed beamformer 
15 delayed by A samples. 

In addition, the observation can be made that in the 

absence of speech leakage, i.e., JV/L^J^O, and 
for infinitely long filters Wi, i=0 ,...,M-1, the SP-SDW-MWF 
(with Wo) corresponds to a cascade of an SDR-GSC and an SDW 

20 single-channel WF (SDW-SWF) postfilter. In the presence of 
speech leakage, the SP-SDW-MWF (with Wq) tries to preserve 
its performance: the SP-SDW-MWF then contains extra 
filtering operations that compensate for the performance 
degradation due to speech leakage. This is illustrated in 

25 Fig. 4. It can e.g. be proven that, for infinite filter 
lengths, the performance of the SP-SDW-MWF (with wq) is not 
affected by microphone mismatch as long as the desired 
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speech component at the output of the fixed beamformer A(z) 
remains unaltered . 

Experimental results 
5 [0060] The theoretical results are now illustrated 

by means of experimental results for a hearing aid 
application. First, the set-up and the performance measures 
used, are described. Next, the impact of the different 
parameter settings of the SP-SDW-MWF on the performance and 
10 the sensitivity to signal model errors is evaluated. 
Comparison is made with the QIC-GSC. 

[0061] Fig. 5 depicts the set-up for the 

experiments. A three -microphone Behind-The-Ear (BTE) 
hearing aid with three omnidirectional microphones (Knowles 

15 FG-3452 ) has been mounted on a dummy head in an office 
room. The interspacing between the first and the second 
microphone is about 1 cm and the interspacing between the 
second and the third microphone is about 1.5 cm. The 
reverberation time TgodB of the room is about 70 0 ms for a 

20 speech weighted noise. The desired speech signal and the 
noise signals are uncorrelated . Both the speech and the 
noise signal have a level of 70 dB SPL at the centre of the 
head. The desired speech source and noise sources are 
positioned at a distance of 1 meter from the head: the 

25 speech source in front of the head (0°), the noise sources 
at an angle 0 w.r.t. the speech source (see also Fig. 5). 
To get an idea of the average performance based on 
directivity only, stationary speech and noise signals with 
the same, average long-term power spectral density are 

3 0 used. The total duration of the input signal is 10 seconds 
of which 5 seconds contain noise only and 5 seconds contain 
both the speech and the noise signal. For evaluation 
purposes, the speech and the noise signal have been 
recorded separately. 
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[0062] The microphone signals are pre-whitened prior 

to processing to improve intelligibility, and the output is 
accordingly de- whitened. In the experiments, the 
microphones have been calibrated by means of recordings of 
5 an anechoic speech weighted noise signal positioned at 0°, 
measured while the microphone array is mounted on the head. 
A delay-and-sum beamformer is used as a fixed beamformer, 
since -in case of small microphone interspacing - it is 
known to be very robust to model errors . The blocking 
10 matrix B pairwise subtracts the time aligned calibrated 
microphone signals . 

[0063] To investigate the effect of the different 

parameter settings (i.e. jj, Wq) on the performance, the 
filter coefficients are computed using (eq.33) where 

15 ^{yo:A^-iyo:M-|} is estimated by means of the clean speech 
contributions of the microphone signals. In practice, 
^{yo:A^-iyo:A^-i} approximated using (eq.27). The effect of 

the approximation (eq.27) on the performance was found to 
be small (i.e. differences of at most 0.5 dB in 

20 intelligibility weighted SNR improvement) for the given 
data set. The QIC-GSC is implemented using variable loading 
RLS . The filter length L per channel equals 96. 
[0064] To assess the performance of the different 

approaches, the broadband intelligibility weighted SNR 

25 improvement is used, defined as 

ASNRintdiig = S A (SNR/.out - SNR^n), ( equa t ion 45) 
/■ 

where the band importance function I± expresses the 
importance of the i-th one-third octave band with centre 
frequency f/" for intelligibility, SNRi^out is the output SNR 
30 (in dB) and SNRi,in is the input SNR (in dB) in the i-th one 
third octave band {'^ANSI S3 , 5-1997 , American National 
Standard Methods for Calculation of the Speech 
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Intelligibility Index' ) . The intelligibility weighted SNR 
reflects how much intelligibility is improved by the noise 
reduction algorithm, but does not take into account speech 
distortion . 

5 [0065] To measure the amount of speech distortion, 

we define the following intelligibility weighted spectral 
distortion measure 

SDinteiiig = X^'SDi (equation 46) 

with SD, the average spectral distortion (dB) in i-th one- 
10 third band, measured as 

SD.= l,JJl0log,,G'{f)\df/[[2'''-2-*'')f;], (equation 47) 

with (f) the power transfer function of speech from the 
input to the output of the noise reduction algorithm. To 
exclude the effect of the spatial pre-processor , the 
15 performance measures are calculated w.r.t. the output of 
the fixed beamformer. 

[0066] The impact of the different parameter 

settings for p and Wq on the performance of the SP-SDW-MWF 
is illustrated for a five noise source scenario. The five 

20 noise sources are positioned at angles 75°, 120°, 180°, 
240°, 285° w.r.t. the desired source at 0°. To assess the 
sensitivity of the algorithm against errors in the assumed 
signal model, the influence of microphone mismatch, e.g., 
gain mismatch of the second microphone, on the performance 

25 is evaluated. Among the different possible signal model 
errors, microphone mismatch was found to be especially 
harmful to the performance of the GSC in a hearing aid 
application. In hearing aids, microphones are rarely 
matched in gain and phase. Gain and phase differences 

30 between microphone characteristics of up to 6 dB and 10°, 
respectively, have been reported. 
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SP-SDW-MWF without W p (SDR-GSC) 

[0067] Fig. 6 plots the improvement ASNRinteiiig and 

the speech distortion SDinteiiig as a function of l/p 
obtained by the SDR-GSC (i.e., the SP-SDW-MWF without 
5 filter Wq) for different gain mismatches at the second 

microphone. In the absence of microphone mismatch, the 
amount of speech leakage into the noise references is 
limited. Hence, the amount of speech distortion is low for 
all jj. Since there is still a small amount of speech 

10 leakage due to reverberation, the amount of noise reduction 
and speech distortion slightly decreases for increasing 
especially for l/jj > 1. In the presence of microphone 
mismatch, the amount of speech leakage into the noise 
references grows. For l/jLi=0 (GSC) , the speech gets 

15 significantly distorted- Due to the cancellation of the 
desired signal, also the improvement ASNRinteiiig degrades. 
Setting l/^>0 improves the performance of the GSC in the 
presence of model errors without compromising performance 
in the absence of signal model errors. For the given set- 

20 up, a value l/jj around 0.5 seems appropriate for 
guaranteeing good performance for a gain mismatch up to 
4dB. 

SP-SDW-MWF with filter W p 

25 [0068] Fig. 7 plots the performance measures 

ASNRinteiiig and SDinteiiig of the SP-SDW-MWF with filter Wq . In 
general, the amount of speech distortion and noise 
reduction grows for decreasing l/p. For l/p=0, all emphasis 
is put on noise reduction. As also illustrated by Fig. 7, 

30 this results in a total cancellation of the speech and the 
noise signal and hence degraded performance. In the absence 
of model errors, the settings Lo=0 and Lqt^O result - except 
for l/i2=0 - in the same ^SNRinteingf while the distortion 
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for the SP-SDW-MWF with wq is higher due to the additional 
single-channel SDW-SWF. For Lot^O the performance does -in 
contrast to Lo=0- not degrade due to the microphone 
mismatch. 

5 [0069] Fig. 8 depicts the improvement ASNRinteiiig and 

the speech distortion SDinteiiig# respectively, of the QIC- 
GSC as a function of |3^. Like the SDR-GSC, the QIC 
increases the robustness of the GSC. The QIC is independent 
of the amount of speech leakage. As a consequence, 

10 distortion grows fast with increasing gain mismatch. The 
constraint value p should be chosen such that the maximum 
allowable speech distortion level is not exceeded for the 
largest possible model errors. Obviously, this goes at the 
expense of reduced noise reduction for small model errors, 

15 The SDR-GSC on the other hand, keeps the speech distortion 
limited for all model errors (see Fig. 6) . Emphasis on 
speech distortion is increased if the amount of speech 
leakage grows. As a result, a better noise reduction 
performance is obtained for small model errors, while 

20 guaranteeing sufficient robustness for large model errors. 
In addition. Fig. 7 demonstrates that an additional filter 
Wo significantly improves the performance in the presence 
of signal model errors. 

[0070] In the previously discussed embodiments a 

25 generalised noise reduction scheme has been established, 
referred to as Spatially pre-processed. Speech Distortion 
Weighted Mul ti- channel , Wiener Filter (SP-SDW-MWF) , that 
comprises a fixed, spatial pre-processor and an adaptive 
stage that is based on a SDW-MWF. The new scheme 
30 encompasses the GSC and MWF as special cases. In addition, 
it allows for an in-between solution that can be 
interpreted as a Speech Distortion Regularised GSC (SDR- 
GSC) . Depending on the setting of a trade-off parameter p 
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and the presence or absence of the filter Wq on the speech 

reference, the GSC, the SDR-GSC or a (SDW-)MWF is obtained. 

The different parameter settings of the SP-SDW-MWF can be 

interpreted as follows: 
5 • Without Wo, the SP-SDW-MWF corresponds to an 

SDR-GSC: the ANC design criterion is supplemented with 
a regularisation term that limits the speech 
distortion due to signal model errors. The larger l/p, 
the smaller the amount of distortion. For l/ij=0, 
10 distortion is completely ignored, which corresponds to 

the GSC-solution. The SDR-GSC is then an alternative 
technique to the QIC-GSC to decrease the sensitivity 
of the GSC to signal model errors. In contrast to the 
QIC-GSC, the SDR-GSC shifts emphasis towards speech 
15 distortion when the amount of speech leakage grows. In 

the absence of signal model errors, the performance of 
the GSC is preserved. As a result, a better noise 
reduction performance is obtained for small model 
errors, while guaranteeing robustness against large 
20 model errors. 

• Since the SP-SDW-MWF takes speech distortion 

explicitly into account, a filter Wq on the speech 
reference can be added. It can be shown that -in the 
absence of speech leakage and for infinitely long 

25 filter lengths- the SP-SDW-MWF corresponds to a 

cascade of an SDR-GSC with an SDW-SWF postfilter. In 
the presence of speech leakage, the SP-SDW-MWF with wq 
tries to preserve its performance: the SP-SDW-MWF then 
contains extra filtering operations that compensate 

30 for the performance degradation due to speech leakage. 

In contrast to the SDR-GSC (and thus also the GSC) , 
the performance does not degrade due to microphone 
mismatch. 
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Experimental results for a hearing aid application confirm 
the theoretical results. The SP-SDW-MWF indeed increases 
the robustness of the GSC against signal model errors. A 
comparison with the widely studied QIC-GSC demonstrates 
5 that the SP-SDW-MWF achieves a better noise reduction 
performance for a given maximum allowable speech distortion 
level . 

Stochastic gradient implementations 

10 [0071] Recursive implementations of the (SDW-)iyiWF 

have been proposed based on a GSVD or QR decomposition. 
Additionally, a subband implementation results in improved 
intelligibility at a significantly lower cost compared to 
the fullband approach. These techniques can be extended to 

15 implement the SP-SDW-MWF. However, in contrast to the GSC 
and the QIC-GSC, no cheap stochastic gradient based 
implementation of the SP-SDW-MWF is available. In the 
present invention, time-domain and frequency-domain 
stochastic gradient implementations of the SP-SDW-MWF are 

20 proposed that preserve the benefit of matrix-based SP-SDW- 
MWF over QIC-GSC. Experimental results demonstrate that the 
proposed stochastic gradient implementations of the SP-SDW- 
MWF outperform the SPA, while their computational cost is 
limited. 

25 [0072] Starting from the cost function of the SP- 

SDW-MWF, a time-domain stochastic gradient algorithm is 
derived. To increase the convergence speed and reduce the 
computational complexity, the stochastic gradient algorithm 
is implemented in the frequency-domain. Since the 

30 stochastic gradient algorithm suffers from a large excess 
error when applied in highly time-varying noise scenarios, 
the performance is improved by applying a low pass filter 
to the part of the gradient estimate that limits speech 
distortion. The low pass filter avoids a highly time- 



varying distortion of the desired speech component while 
not degrading the tracking performance needed in time- 
varying noise scenarios. Next, the performance of the 
different frequency-domain stochastic gradient algorithms 
is compared. Experimental results show that the proposed 
stochastic gradient algorithm preserves the benefit of the 
SP-SDW-MWF over the QIC-GSC. Finally, it is shown that the 
memory cost of the frequency-domain stochastic gradient 
algorithm with low pass filter is reduced by approximating 
the regularisation term in the frequency-domain using 
(diagonal) correlation matrices instead of data buffers. 
Experiments show that the stochastic gradient algorithm 
using correlation matrices has the same performance as the 
stochastic gradient algorithm with low pass filter. 

Stochastic gradient algorithm 

Derivation 

[0073] A stochastic gradient algorithm approximates 

the steepest descent algorithm, using an instantaneous 
gradient estimate. Given the cost function (eq.38), the 
steepest descent algorithm iterates as follows (note that 
in the sequel the subscripts 0:M-i in the adaptive filter 
Wo:M-i and the input vector yo:M-i are omitted for the sake of 
conciseness) : 



with w[A:], y[A:] G C'^'''''' , where N denotes the number of input 
channels to the adaptive filter and L the number of filter 
taps per channel. Replacing the iteration index n by a time 
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index k and leaving out the expectation values E{.}, one 
obtains the following update equation 



5 For l/jj=0 and no filter Wq on the speech reference, (eq.49) 
reduces to the update formula used in GSC during periods of 
noise only (i.e., when yi[k] = y"[k], i = 0,.,,,M -I ) , The additional 
term r [k] in the gradient estimate limits the speech 
distortion due to possible signal model errors. 
10 [0074] Equation (49) requires knowledge of the 

correlation matrix y'[k]y"'"[k] or E{y"[k]y''"[k]} of the clean 
speech. In practice, this information is not available. To 
avoid the need for calibration, speech + noise signal 

vectors y^,^^ are stored into a circular buffer B, e i?'^'''**'^' 
15 during processing. During periods of noise only (i.e., when 
yi[k] = y"[k], i=0 ^..,^M-1) , the filter w is updated using the 

following approximation of the term ^[k]='j^y'[k]y''^[k]w[k] in 



+ 1] = yv[k] + p ^ 



y"[k](yr[>c - A] - y [^M^]) - ly ^ [A:]y-'' [^]w[^] ► . 



(equation 49) 



(eq.49) 




(equation 50) 



20 which results in the update formula 



w[k-h\] = yv[k]-{' p< 



y[k]iy;[k~A]- 



y''[k]^v[k])-^y,^^,^[k]yZJ^^^^^ 



r{k] 

(equation 51) 



In the sequel, a normalised step size p is used, i.e. 

P= , I ^ — ■ TTTTTTT. — (equation 52) 
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where 5 is a small positive constant. The absolute value 
|yA!//;yAijr, ""y^yj been inserted to guarantee a positive 

valued estimate of the clean speech energy y '''^[^]y '[^] . 
Additional storage of noise only vectors y^„y.^ in a second 

5 buffer e 7?'^^*''*''^^ allows to adapt w also during periods of 
speech + noise, using 

(equation 53) 

10 with 

p' 

p = —r~Z 7, \ u • (equation 54) 

iy {k]yik] - yl^^ [k]y,„^^ [k]\ + y",„^^ [k]y,„^^ [^] + S 

For reasons of conciseness only the update procedure of the 
time -domain stochastic gradient algorithms during noise 
only will be considered in the sequel, hence y[Jc]= [Jc] . 
15 The extension towards updating during speech + noise 
periods with the use of a second, noise only buffer B2 is 
straightforward: the equations are found by replacing the 
noise-only input vector y[k] by y^„/J/:] and the speech + 

noise vector Yhu/X^^ the input speech + noise vector 

20 y[k] . 

It can be shown that the algorithm (eq. 51) - (eq. 52) is 
convergent in the mean provided that the step size p is 
smaller than with X^ax the maximum eigenvalue of 

^{jiyb,,f,y"u/,"^i^-j:)yy"}' The similarity of (eq.51) with standard 
25 NLMS let us presume that setting p< I,, , with X±, 

i=l ML the eigenvalues of ^i^y^^.y^i^, +(l-^)yy''} e , or 

-in case of FIR filters- setting 
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p< ^ , ■=rn~\ (equation 55) 

guarantees convergence in the mean square. Equation (55) 
explains the normalisation (eq.52) and (eq.54) for the step 
size p. 

5 [0075] However, since generally 

y[/:]y''[A:]^y;V,Wy;';^[n (equation 56) 

the instantaneous gradient estimate in (eq.51) is -compared 
to (eq.49) - additionally perturbed by 

■^(yWy''W-y;V.[^]yA;(?[*])wW. (equation 57) 

10 for 1/^27^0. Hence, for 1/)L2#0, the update equations (eq-51)- 
(eq.54) suffer from a larger residual excess error than 
(eq.49). This additional excess error grows for decreasing 

increasing step size p and increasing vector length 
of the vector y. It is expected to be especially large for 

15 highly non- stationary noise, e.g. mult i- talker babble 
noise . 

Remark that for |a>l, an alternative stochastic gradient 
algorithm can be derived from algorithm (eq. 51) - (eq. 54) by 
invoking some independence assumptions. Simulations, 
20 however, showed that these independence assumptions result 
in a significant performance degradation, while hardly 
reducing the computational complexity. 



Frecjuency- domain implementation 

25 [0076] As stated before, the stochastic gradient 

algorithm (eq. 51) - (eq. 54) is expected to suffer from a 
large excess error for large p' /jL2 and/or highly time- 
varying noise, due to a large difference between the rank- 
one noise correlation matrices y*\kW"\}^\ measured at 

3 0 different time instants ic. The gradient estimate can be 
improved by replacing 
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yku/,Wy^«f,[/(]-y[k]y"[k] (equation 58) 

in (eq.51) with the time-average 

^ Z y*«/.my*"^,m-^ Z y[^]y''[a (equation 59) 
where tSL-zt+i t^^y^j^.t'] updated during periods of speech 

+ noise and ■jt"2]/=A-A'+i yt^^y^'t'l during periods of noise only. 
However, this would require expensive matrix operations. A 
block-based implementation intrinsically performs this 
averaging : 



K-\ 



Z y[^^ + ^tyo {kK + i-^]-y" [kK + i]yv[kK]) 



— Z (y *«/, + ^^y"«f, + ^1 - y[^^ + ']y " + 

M /=o 

10 (equation 60) 

The gradient and hence also yA„y;[A:]y^^ [A:]-y[X:]y''[A:] is averaged 

over K iterations prior to making adjustments to w. This 
goes at the expense of a reduced (i.e. by a factor K) 
convergence rate. 

15 [0077] The block-based implementation is 

computationally more efficient when it is implemented in 
the frequency-domain, especially for large filter lengths : 
the linear convolutions and correlations can then be 
efficiently realised by FFT algorithms based on overlap- 

20 save or overlap-add. In addition, in a frequency-domain 
implementation, each frequency bin gets its own step size, 
resulting in faster convergence compared to a time-domain 
implementation while not degrading the steady-state excess 
MSE. 

25 [0078] Algorithm 1 summarises a frequency- domain 

implementation based on overlap-save of (eq. 51) - (eq. 54) . 
Algorithm 1 requires (3N+4) FFTs of length 2L. By storing 
the FFT- transformed speech + noise and noise only vectors 



36 

in the buffers B, e C"^"'*"^' and 83 e C"^"'*"^^ , respectively, 
instead of storing the time-domain vectors, N FFT 
operations can be saved. Note that since the input signals 
are real, half of the FFT components are complex- 
5 conjugated. Hence, in practice only half of the complex FFT 
components have to be stored in memory. When adapting 
during speech + noise, also the time-domain vector 

yo[kL-A'\-L-l]]^ (equation 61) 

should be stored in an additional buffer 830^^ ^ during 
10 periods of noise-only, which -for N=M- results in an 
additional storage of -y^ words compared to when the time- 
domain vectors are stored into the buffers Bi and B2 - 
Remark that in Algorithm 1 a common trade-off parameter p 
is used in all frequency bins. Alternatively, a different 
15 setting for jj can be used in different frequency bins. E.g. 
for SP-SDW-MWF with Wo=0, l//j could be set to 0 at those 
frequencies where the GSC is sufficiently robust, e.g., for 
small-sized arrays at high frequencies. In that case, only 
a few frequency components of the regularisation terms 
20 Rilk] , i=M-N, . . . ,M-1, need to be computed, reducing the 
computational complexity. 



g = 



; k = [0/ I, ] ; F = 2Z X 2Z DFT matrix 



37 

Algorithm 1: Frequency-domain stochastic gradient SP-SDW- 

MWF based on overlap -save 

Initialisation: 

\V;.[0] = [0 ••• of, i = M-N,...,M-l 

5 ^J0] = 5„„ m = 0,...,2L-\ 

Matrix definitions: 

h 0/. 
0/. 0. 

For each new block o£ NL input samples : 

♦ If noise detected: 

10 1. F[yXkL-L] ... j;.[H. + Z / = M -A^,...,M -1 ^ noise buffer 

[yol^L - A] yolkL-A-^-L- 1]]' -> noise buffer 63 q 
2. Y;'[k] = dmg\F[y,[kL-L] ... + =M-7V,...,M-1 

dm = [yo[^^-A] 3^o[^Z-A-hZ-l]f 
Create Yi [k] from data in speech + noise buffer Bi. 
15 ♦ If speech detected: 

1 . F[y^[kL - L] ... y^[kL + Z - 1]]^/ = M .,M -1 speech + noise buffer B, 

2. Y,[A:] = diag{F[>;,[^Z-Z] ... + },/ = M-A^,...,M-1 

Create d[Jc] and Yi" [A:] from noise buffer B2,o and B2 

♦ Update formula: 

20 1. e,OT = kF-'2;r;^.^Y;WW,[^] = y<,„., 

e[^] = d[)k]-e,[A:] 

E,[k] = Fk'e,[^];E2[^] = Fk'e^LAr] .• E[k] - Fk'e[^] 
2 . A[k] = ^^diag{P;'[kl...,P{Uk]} 

2 5 = ^/^„[^ - 1] + (1 - + (|Y.>.f -|y;: 



5 
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(i^M-Nr ... , M-1) 

♦ Output: yoW = bo[*^-A] yo[w:-A+z-i]f 

• If noise detected: YoutW = yo[^]-yout.i W 

• If speech detected: yout[^] = yo[^]-yout.2[^] 

Improvement 1; stochastic gradient algorithm with low pass 
filter 



[0079] For spectrally stationary noise, the limited 

10 (i.e. K=L) averaging of (eq.59) by the block-based and 
frequency- domain stochastic gradient implementation may 
offer a reasonable estimate of the short-term speech 
correlation matrix ^{y'y*'''} • However, in practical 
scenarios, the speech and the noise signals are often 

15 spectrally highly non- stationary (e.g. multi- talker babble 
noise) while their long-term spectral and spatial 
characteristics (e.g. the positions of the sources) usually 
vary more slowly in time. For these scenarios, a reliable 
estimate of the long-term speech correlation matrix 

20 E{y''y''"} that captures the spatial rather than the short- 
term spectral characteristics can still be obtained by 
averaging (eq.59) over K>>L samples. Spectrally highly non- 
stationary noise can then still be spatially suppressed by 
using an estimate of the long-term speech correlation 

25 matrix in the regularisation term r [Jc] . A cheap method to 
incorporate a long-term averaging {K>>L) of (eq.59) in the 
stochastic gradient algorithm is now proposed, by low pass 
filtering the part of the gradient estimate that takes 
speech distortion into account (i.e. the term r[k.] in 

30 (eq.51)). The averaging method is first explained for the 
time-domain algorithm (eq. 51) - (eq. 54) and then translated 
to the frequency- domain implementation. 
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Assume that the long-term spectral and spatial 
characteristics of the noise are quasi -stationary during at 
least K speech + noise samples and K noise samples. A 
reliable estimate of the long-term speech correlation 

5 matrix E{y'y''"} is then obtained by (eq.59) with K»L. To 
avoid expensive matrix computations, r[k] can be 
approximated by 

T 2 (y*«/,[ny^:e..m-y[/]y''[/])w[/]. (equation 62) 

Since the filter coefficients w of a stochastic gradient 
10 algorithm vary slowly in time, (eq.62) appears a good 
approximation of rikl , especially for small step size p' . 
The averaging operation (eq.62) is performed by applying a 
low pass filter to jt [/c] in (eq.51) : 

r[k] = iirik - 1] + (1 - X) 1 (y ,„^^ [k]y"„,^^ [k] - y{k]y"[k]) w[^], ( equa t ion 63) 

15 where X <\ , This corresponds to an averaging window K of 
about -j^T samples. The normalised step size p is modified 
into 

p' 

p = -n (equation 64) 

r,^m + y"[kMk] + S 

^.J^] = 5Cr„,JA:-l] + (l-;C)l|y,^^J^]y,„^J^]-y''my[^]|. (equation 65) 

M 

20 Compared to (eq.51), (eq.63) requires 3NL-1 additional MAC 

and extra storage of the NLxl vector r[k] . 

[0080] Equation (63) can be easily extended to the 

frequency-domain. The update equation for Wi [k+1] in 
Algorithm 1 then becomes (Algorithm 2) : 

+ 1] = W,[^] + FgF-'A[^]( [^]E[^] - R,[A:]); 



25 



R,[k]^XRXk-l] + (\-X)-(Y,"[k]E,[k]-Y;'''[k]K,[k]) 



(equation 66) 
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with 



E[A:] = Fk' 



J=M-N 



E,[^] = Fk'kF-' 2 Y;[^]W,M; 



j=M~N 
A/-1 



j=A'f-N 

and A[k] computed as follows: 

A[k] = ^diag {Po"' [^], P^t. [k]} 
PJk] = Y PJA: - 1] + (1 -y ) (/>,J^] + /'..J/r]) 



(equation 67) 
(equation 68) 
(equation 69) 

(equation 70) 

(equation 71) 



^.».M= Z (equation 72) 



p^„,m=xp^„,[k-\\+{}-x)- 



/vy-i / 2 



(equation 73) 



10 Compared to Algorithm 1, (eq. 66) - (eq. 69) require one extra 
2L-point FFT and 8NL-2N-2L extra MAC per L samples and 
additional memory storage of a 2NLxl real data vector. To 
obtain the same time constant in the averaging operation as 

in the time-domain version with K=l, A should equal 
15 The experimental results that follow will show that the 
performance of the stochastic gradient algorithm is 
significantly improved by the low pass filter, especially 
for large A. 

[0081] Now the computational complexity of the 

20 different stochastic gradient algorithms is discussed. 
Table 1 summarises the computational complexity (expressed 
as the number of real mult iply-accumulates (MAC) , divisions 
(D) , square roots (Sq) and absolute values (Abs) ) of the 
time-domain (TD) and the frequency-domain (FD) Stochastic 
25 Gradient (SG) based algorithms. Comparison is made with 
standard NLMS and the NLMS based SPA. One complex 
multiplication is assumed to be equivalent to 4 real 



41 



multiplications and 2 real additions. A 2L-point FFT of a 
real input vector requires 2Llog22L real MAC (assuming a 
radix- 2 FFT algorithm) . 

Table 1 indicates that the TD-SG algorithm without filter 
Wo and the SPA are about twice as complex as the standard 
ANC. When applying a Low Pass filter (LP) to the 
regularisation term, the TD-SG algorithm has about three 
times the complexity of the ANC. The increase in complexity 
of the frequency- domain implementations is less. 



Algorithm 

XD NLMS ANC 



update formula 

(2M- 2)1 + 1) MAC 
NLMS based SPA (4(M - 1)1 + 1) MAC+lD+lSq 



SG 

SG with LP 
FD NLMS ANC 

NLMS based SPA 



SG 

(Algorithm 1) 

SG with LP 
(Algorithm 2) 



10 



(4iVL4-5)MAC 
(7A^Z + 4)MAC 

(10M-7-^^) + 
(6M-2)log2 2L MAC 

14M-ll-i^^ + 
(6M-2)log2 2ZMAC 
+1/Z Sq + l/ZD 

(18A^ + 6~^) + 
(6A^ + 8)log2 21 MAC 

(26A^ + 4— 4^) 

+(67V + 10)log2 2ZMAC 

Table 1 



step size adaptation 

lD + (M-l)i:MAC 
lD + (M-l)i:MAC 
lD + lAbs + (2M: + 2)MAC 
lD + lAbs + (2M: + 4)MAC 
1D + (2M-I-2)MAC 

1D + (2M + 2)MAC 



lD + lAbs + (4A^ + 4)MAC 
ID + 1 Abs + (4A^ + 6) MAC 



[0082] As an illustration. Fig. 9 plots the 

complexity (expressed as the number of Mega operations per 
second (Mops) ) of the time-domain and the frequency-domain 
15 stochastic gradient algorithm with LP filter as a function 
of L for M=3 and a sampling frequency fs=16 kHz, Comparison 
is made with the NLMS -based ANC of the GSC and the SPA. The 
complexity of the FD SPA is not depicted, since for small 
M, it is comparable to the cost of the FD-NLMS ANC. For 
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L>8, the frequency- domain implementations result in a 
significantly lower complexity compared to their time- 
domain equivalents. The computational complexity of the FD 
stochastic gradient algorithm with LP is limited, making it 
5 a good alternative to the SPA for implementation in hearing 
aids . 

In Table 1 and Fig. 9 the complexity of the time-domain and 
the frequency-domain NLMS ANC and NLMS based SPA represents 
the complexity when the adaptive filter is only updated 

10 during noise only. If the adaptive filter is also updated 
during speech + noise using data from a noise buffer, the 
time -domain implementations additionally require NL MAC per 
sample and the frequency-domain implementations 
additionally require 2 FFT and (4L (M-l) -2 (M-1) +L) MAC per L 

15 samples . 

[0083] The performance of the different FD 

stochastic gradient implementations of the SP-SDW-MWF is 
evaluated based on experimental results for a hearing aid 
application. Comparison is made with the FD-NLMS based SPA. 
20 For a fair comparison, the FD-NLMS based SPA is -like the 
stochastic gradient algorithms- also adapted during speech 
+ noise using data from a noise buffer. 

[0084] The set-up is the same as described before 

(see also Fig. 5) . The performance of the FD stochastic 

25 gradient algorithms is evaluated for a filter length L=32 
taps per channel, p'=0.8 and y=0- To exclude the effect of 
the spatial pre-processor , the performance measures are 
calculated w.r.t. the output of the fixed beamformer. The 
sensitivity of the algorithms against errors in the assumed 

30 signal model is illustrated for microphone mismatch, e.g. a 
gain mismatch Y2=4dB of the second microphone. 
[0085] Fig. 10(a) and (b) compare the performance of 

the different FD Stochastic Gradient (SG) SP-SDW-MWF 
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algorithms without Wq (i.e., the SDR-GSC) as a function of 
the trade-off parameter p for a stationary and a non- 
stationary (e.g. multi-talker babble) noise source, 
respectively, at 90°. To analyse the impact of the 
5 approximation (eq.50) on the performance, the result of a 
FD implementation of (eq.49), which uses the clean speech, 
is depicted too. This algorithm is referred to as optimal 
FD-SG algorithm. Without Low Pass (LP) filter, the 
stochastic gradient algorithm achieves a worse performance 

10 than the optimal FD-SG algorithm (eq.49), especially for 
large l/^i. For a stationary speech- like noise source, the 
FD-SG algorithm does not suffer too much from approximation 
(eq.50). In a highly time-varying noise scenario, such as 
multi-talker babble, the limited averaging of r [k] in the 

15 FD implementation does not suffice to maintain the large 
noise reduction achieved by (eq.49) . The loss in noise 
reduction performance could be reduced by decreasing the 
step size p' , at the expense of a reduced convergence 
speed. Applying the low pass filter (eq.66) with e.g. 

20 X=0.999 significantly improves the performance for all l/|i, 
while changes in the noise scenario can still be tracked. 
[0086] Fig. 11 plots the SNR improvement ASNRinteiiig 

and the speech distortion SDinteiiig of the SP-SDW-MWF 
(1/^=0.5) with and without filter Wq for the babble noise 

25 scenario as a function of where A is the exponential 

weighting factor of the LP filter (see (eq.66)). 
Performance clearly improves for increasing A. For small A, 
the SP-SDW-MWF with Wq suffers from a larger excess error - 
and hence worse ASNRinteiiig - compared to the SP-SDW-MWF 
30 without Wq. This is due to the larger dimensions of 

[0087] The LP filter reduces fluctuations in the 

filter weights Wi [Jc] caused by poor estimates of the short- 
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term speech correlation matrix E{y^y^'^} and/or by the 
highly non-stationary short-term speech spectrum. In 
contrast to a decrease in step size p' , the LP filter does 
not compromise tracking of changes in the noise scenario. 
5 As an illustration. Fig. 12 plots the convergence behaviour 
of the FD stochastic gradient algorithm without Wq (i.e. 
the SDR-GSC) for X=0 and A=0.9998, respectively, when the 
noise source position suddenly changes from 90° to 180°. A 
gain mismatch of 4 dB was applied to the second 

10 microphone. To avoid fast fluctuations in the residual 
noise energy and the speech distortion energy , the 

desired and the interfering noise source in this experiment 
are stationary, speech- like. The upper figure depicts the 
residual noise energy e,^ as a function of the number of 

15 input samples, the lower figure plots the residual speech 
distortion during speech + noise periods as a function 

of the number of speech + noise samples. Both algorithms 
(i.e., A=0 and A=0.9998) have about the same convergence 
rate. When the change in position occurs, the algorithm 

20 with X=0.9998 even converges faster. For A=0, the 
approximation error (eq.50) remains large for a while since 
the noise vectors in the buffer are not up to date. For 
A=0.9998, the impact of the instantaneous large 
approximation error is reduced thanks to the low pass 

25 filter. 

[0088] Fig. 13 and Fig. 14 compare the performance 

of the FD stochastic gradient algorithm with LP filter 

{X-=0 . 9998) and the FD-NLMS based SPA in a multiple noise 
source scenario. The noise scenario consists of 5 multi- 
30 talker babble noise sources positioned at angles 
75°, 120°, 180°, 240°, 285° w.r.t. the desired source at 0°. To 
assess the sensitivity of the algorithms against errors in 
the assumed signal model, the influence of microphone 



mismatch, i.e. gain mismatch Y2=4dB of the second 
microphone, on the performance is depicted too. In Fig. 13, 
the SNR improvement ASNRinteiiig and the speech distortion 
SDinteiiig of the SP-SDW-MWF with and without filter Wq is 
5 depicted as a function of the trade-off parameter 1/p. Fig. 
14 shows the performance of the QIC-GSC 

yv"yv<^^ (equation 74) 

for different constraint values P^, which is implemented 
using the FD-NLMS based SPA. 

10 The SPA and the stochastic gradient based SP-SDW-MWF both 
increase the robustness of the GSC (i.e., the SP-SDW-MWF 
without Wo and l/|a=0) . For a given maximum allowable speech 
distortion SDinteiiig/ the SP-SDW-MWF with and without Wq 
achieve a better noise reduction performance than the SPA. 

15 The performance of the SP-SDW-MWF with Wq is -in contrast 
to the SP-SDW-MWF without wq- not affected by microphone 
mismatch. In the absence of model errors, the SP-SDW-MWF 
with wo achieves a slightly worse performance than the SP- 
SDW-MWF without Wo . This can be explained by the fact that 

20 with Wq, the estimate of j^Eiy'y''^} is less accurate due to 
the larger dimensions of -^^{y'y''^} (see also Fig. 11) . In 

conclusion, the proposed stochastic gradient implementation 
of the SP-SDW-MWF preserves the benefit of the SP-SDW-MWF 
over the QIC-GSC. 

25 

Improvement 2 ; frequency- domain stochastic gradient 
algorithm using correlation matrices 

[0089] It is now shown that by approximating the 

regularisation term in the frequency- domain, (diagonal) 
30 speech and noise correlation matrices can be used instead 
of data buffers, such that the memory usage is decreased 
drastically, while also the computational complexity is 
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further reduced. Experimental results demonstrate that this 
approximation results in a small -positive or negative- 
performance difference compared to the stochastic gradient 
algorithm with low pass filter, such that the proposed 
5 algorithm preserves the robustness benefit of the SP-SDW- 
MWF over the QIC-GSC, while both its computational 
complexity and memory usage are now comparable to the NLMS- 
based SPA for implementing the QIC-GSC. 

[0090] As the estimate of r [k] in (eq.51) proved to 

10 be quite poor, resulting in a large excess error, it was 
suggested in (eq. 59) to use an estimate of the average 
clean speech correlation matrix. This allows r [k] to be 
computed as 

r[*] = l(l-X)Xi*"'(yA„/,[/]y^l/J/]-y''[/]y""[/])-w (equation 75) 

1^ /=0 

15 with X an exponential weighting factor. For stationary 
noise a small X , i.e. 1/(1 ^A,) □ A^Z , suffices. However, in 
practice the speech and the noise signals are often 
spectrally highly non- stationary (e.g. multi-talker babble 
noise) , whereas their long-term spectral and spatial 

20 characteristics usually vary more slowly in time. 
Spectrally highly non-stationary noise can still be 
spatially suppressed by using an estimate of the long-term 

correlation matrix in r [k] , i.e. 1/(1-X,) » A/Z . 

In order to avoid expensive matrix operations for computing 
25 (eq.75), it was previously assumed that w[k] varies slowly 
in time, i.e. w[k]^w[l], such that (eq.75) can be 
approximated with vector instead of matrix operations by 
directly applying a low pass filter to the regularisation 
term r [k] , cf . (eq.63), 

30 rm = -(l-A:)X/"'(y*,r.[']y*"/.[^]-y"[^ly"''[']) M/] (equation 76) 
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= X:r[^-l] + (l-X)-!-(y,„,,[^K^,[A:]-y"[A:]y''"[*])w[^]. (equation 77) 

However, this assumption is actually not required in a 
frequency-domain implementation, as will now be shown. 
[0091] The frequency-domain algorithm called 

5 Algorithm 2 requires large data buffers and hence the 
storage of a large amount of data (note that to achieve a 
good performance, typical values for the buffer lengths of 
the circular buffers Bi and B2 are 10000...20000 ) . A 
substantial memory (and computational complexity) reduction 
10 can be achieved by the following two steps : 

• When using (eq.75) instead of (eq.77) for 
calculating the regularisat ion term, correlation 
matrices instead of data samples need to be stored. 
The frequency-domain implementation of the resulting 
15 algorithm is summarised in Algorithm 3, where 2Lk2L- 

dimensional speech and noise correlation matrices S,^.[^:] 
and S;-J[A:],/,7 =M-A^...M-1 are used for calculating the 
regularisation term Ri [k] and (part of) the step size 
A[k] . These correlation matrices are updated 

2 0 respectively during speech + noise periods and noise 

only periods. When using correlation matrices, filter 
adaptation can only take place during • noise only 
periods, since during speech + noise periods the 
desired signal cannot be constructed from the noise 

25 buffer B2 anymore. This first step however does not 

necessarily reduce the memory usage (NLbufi for data 
buffers vs. 2(NL)^ for correlation matrices) and will 
even increase the computational complexity, since the 
correlation matrices are not diagonal. 

30 • The correlation matrices in the frequency-domain 

can be approximated by diagonal matrices, since Fk'^kF'^ 
in Algorithm 3 can be well approximated by I21./2 . 
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Hence, the speech and the noise correlation matrices 
are updated as 

S,j[k] = XSy[k-'\]'^(\--X)Yf'[k]Yj[k]/2, (equation 78) 

Sl[k]^XSllk-\]-^(\-X)Y;'''[k]Y][k]/2, (equation 79) 

5 leading to a significant reduction in memory usage and 

computational complexity, while having a minimal 
impact on the performance and the robustness . This 
algorithm will be referred to as Algorithm 4 . 
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10 



15 



20 



Algorithm 



Frequency- domain implementation with 



correlation matrices (without approximation) 
Initialisation and matrix definitions: 

W,[0] = [0 •• QY,i = M-N...M-\ 

^„[0] = 5„„m-0...2Z-l 

F = 2Z,x2Z -dimensional DFT matrix 

OL=IiXL-dim. zero matrix, lL=LxL-dim. identity matrix 
For each new block of L samples (per channel) : 

dOT = U[^-A] - y,\_kL-^ + L-\]f 
Y,[yfc] = diag{F[:);,[ytZ-Z] ••• + = M-A^...M-1 

Output signal: 



> k = [0, I,] 



e[^] = d[^]-kF-' 2^ ^jikySNjikl E[A:] = Fk'e[*] 

J^M-N 

If speech detected: 

If noise detected: Y.[k]=^Y"[k] 
^oW = (1 >^'"'Y;'''[/]Fk'kF-'Y;[/] = ;iS;[A: -1] + (1 [A:]Fk'kF-^Y;[^] 



/=o 



Update formula (only during noise-only-periods ) 



R.[^] = - Z [Sy[k]-Sl[k]]Wj[kli = M-N...M-\ 

W,.[^ + 1] = W,[^] + FgF"'A[A]{X"''' [A:]E[^] - R.[^^^ 
wi th 

A[k] = ^ diag {/>„-' [k],..., P;i, [k]} 

P„.[k]=yP„[k-l] + 0-y)(PU^] + P2jk]),m = 0...2L-l 

,m = 0...2L-\ 



M- 



J=M-N 



M-\ 



j=M-N 
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[0092] Table 2 summarises the computational 

complexity and the memory usage of the frequency-domain 
NLMS-based SPA for implementing the QIC-GSC and the 
frequency-domain stochastic gradient algorithms for 
5 implementing the SP-SDW-MWF (Algorithm 2 and Algorithm 4) . 
The computational complexity is again expressed as the 
number of Mega operations per second (Mops) , while the 
memory usage is expressed in kWords . The following 
parameters have been used: M=3 , L=32, fs=16kHz, L):,ufi=10000 , 
10 (a) N=M-1, (b) N=M. From this table the following 
conclusions can be drawn: 

• The computational complexity of the SP-SDW-MWF 
(Algorithm 2) with filter Wq is about twice the 
complexity of the QIC-GSC (and even less if the filter 
15 Wo is not used) . The approximation of the 

regularisation term in Algorithm 4 further reduces the 
computational complexity. However, this only remains 
true for a small number of input channels, since the 
approximation introduces a quadratic term O(N^) . 

20 • Due to the storage of data samples in the 

circular speech + noise buffer Bi, the memory usage of 
the SP-SDW-MWF (Algorithm 2) is quite high in 
comparison with the QIC-GSC (depending on the size of 
the data buffer Ljbufi of course) . By using the 

25 approximation of the regularisation term in Algorithm 

4, the memory usage can be reduced drastically, since 
now diagonal correlation matrices instead of data 
buffers need to be stored. Note however that also for 
the memory usage a quadratic term O(N^) is present. 

30 



Algorithm 



NLMS based SPA 



SG with LP 
(Algorithm 2) 

SG with correlation 
matrices 
(Algorithm 4) 



NLMS based SPA 

SG with LP (Algorithm 
2) 

SG with correlation 
matrices 
(Algorithm 4) 
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Computational complexity 



update formula 

(6M-2)log2 2i:MAC 
H-l/ISq + l/ID 

(26A^ + 4--^) + 
(6A^-hlO)log2 2LMAC 

(6iV + 4)log2 2i:MAC 

Memory usage 
4{M-\)L-¥6L 



27VZ^„^^ +6LN + 1L 



step size 
adaptation 

(2M + 2)MAC 
+ 1D 



Mops 



2.16 



(4A^ + 6)MAC 3.22^«\ 4.27^*'^ 
+lD + lAbs 



(2A^ + 4)MAC 
+lD + lAbs 



ALN^-\-6LN + lL 



Table 2 



2.71^'^ 4.31^*'^ 

kWords 

0.45 

40.61^'\ 60.80^^^ 



[0093] It is now shown that practically no 

performance difference exists between Algorithm 2 and 
5 Algorithm 4, such that the SP-SDW-MWF using the 
implementation with (diagonal) correlation matrices still 
preserves its robustness benefit over the GSC (and the QIC- 
GSC) . The same set-up has been used as for the previous 
experiments . 

10 The performance of the stochastic gradient algorithms in 
the frequency-domain is evaluated for a filter length L=32 



per channel. 



= 0.8, Y=0>95 and For all 



considered algorithms, filter adaptation only takes place 
during noise only periods. To exclude the effect of the 
15 spatial pre-processor , the performance measures are 
calculated with respect to the output of the fixed 
beamformer. The sensitivity of the algorithms against 
errors in the assumed signal model is illustrated for 
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microphone mismatch, i.e. a gain mismatch Y2=4dB at the 
second microphone . 

[0094] Fig. 15 and Fig. 16 depict the SNR 

improvement ASNRinteiiig and the speech distortion SDinteiiig 

of 

5 the SP-SDW-MWF (with Wq) and the SDR-GSC (without Wq) , 
implemented using Algorithm 2 (solid line) and Algorithm 4 
(dashed line) , as a function of the trade-off parameter 
l/jL2. These figures also depict the effect of a gain 
mismatch ^2=4 dB at the second microphone. From these 

10 figures it can be observed that approximating the 
regularisation term in the frequency-domain only results in 
a small performance difference. For most scenarios the 
performance is even better (i.e. larger SNR improvement and 
smaller speech distortion) for Algorithm 4 than for 

15 Algorithm 2 . 

[0095] Hence, also when implementing the SP-SDW-MWF 

using the proposed Algorithm 4, it still preserves its 
robustness benefit over the GSC (and the QIC-GSC) . E.g. it 
can be observed that the GSC (i.e. SDR-GSC with 1/^=0) will 

20 result in a large speech distortion (and a smaller SNR 
improvement) when microphone mismatch occurs. Both the SDR- 
GSC and the SP-SDW-MWF add robustness to the GSC, i.e. the 
distortion decreases for increasing 1//J. The performance of 
the SP-SDW-MWF (with wq) is again hardly affected by 

25 microphone mismatch. 



